NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

PDIP: Priority Directed Instruction Prefetching

https://doi.org/10.1145/3620665.3640394

Godala, Bhargav Reddy; Ramesh, Sankara Prasad; Pokam, Gilles A; Stark, Jared; Seznec, Andre; Tullsen, Dean; August, David I (April 2024, ACM)

Modern server workloads have large code footprints which are prone to front-end bottlenecks due to instruction cache capacity misses. Even with the aggressive fetch directed instruction prefetching (FDIP), implemented in modern processors, there are still significant front-end stalls due to I-Cache misses. A major portion of misses that occur on a BPU-predicted path are tolerated by FDIP without causing stalls. Prior work on instruction prefetching, however, has not been designed to work with FDIP processors. Their singular goal is reducing I-Cache misses, whereas FDIP processors are designed to tolerate them. Designing an instruction prefetcher that works in conjunction with FDIP requires identifying the fraction of cache misses that impact front-end performance (that are not fully hidden by FDIP), and only targeting them. In this paper, we propose Priority Directed Instruction Prefetching (PDIP), a novel instruction prefetching technique that complements FDIP by issuing prefetches for only targets where FDIP struggles - along the resteer path of front-end stall-causing events. PDIP identifies these targets and associates them with a trigger for future prefetch. At a 43.5KB budget, PDIP achieves up to 5.1% IPC speedup on important workloads such as cassandra and a geomean IPC speedup of 3.2% across 16 benchmarks.
more » « less
Full Text Available
GhOST: a GPU Out-of-Order Scheduling Technique for Stall Reduction

https://doi.org/10.1109/ISCA59077.2024.00011

Chaturvedi, Ishita; Godala, Bhargav Reddy; Wu, Yucan; Xu, Ziyang; Iliakis, Konstantinos; Eleftherakis, Panagiotis-Eleftherios; Xydis, Sotirios; Soudris, Dimitrios; Sorensen, Tyler; Campanoni, Simone; et al (June 2024, IEEE)

Full Text Available
Revisiting Computation for Research: Practices and Trends

https://doi.org/10.1109/SC41406.2024.00076

Giordani, Jeremiah; Xu, Ziyang; Colby, Ella; Ning, August; Godala, Bhargav Reddy; Chaturvedi, Ishita; Zhu, Shaowei; Chon, Yebin; Chan, Greg; Tan, Zujun; et al (November 2024, IEEE)

Full Text Available
EMISSARY: Enhanced Miss Awareness Replacement Policy for L2 Instruction Caching

https://doi.org/10.1145/3579371.3589097

Nagendra, Nayana Prasad; Godala, Bhargav Reddy; Chaturvedi, Ishita; Patel, Atmn; Kanev, Svilen; Moseley, Tipp; Stark, Jared; Pokam, Gilles A.; Campanoni, Simone; August, David I. (June 2023, Proceedings of the 50th International Symposium on Computer Architecture (ISCA))

Full Text Available

Search for: All records